Improving Structured Grid-Based Sparse Matrix-Vector Multiplication and Gauss–Seidel Iteration on GPDSP

نویسندگان

چکیده

Structured grid-based sparse matrix-vector multiplication and Gauss–Seidel iterations are very important kernel functions in scientific engineering computations, both of which memory intensive bandwidth-limited. GPDSP is a general purpose digital signal processor, significant embedded processor that has been introduced into high-performance computing. In this paper, we designed various optimization methods, included blocking method to improve data locality increase access efficiency, multicolor reordering develop fine-grained parallelism, partitioning for structures, double buffering overlap computation on structured SpMV GPDSP. At last, combined the above methods design multicore vectorization algorithm. We tested matrices generated with grids different sizes platform obtained speedups up 41× 47× compared unoptimized iterations, maximum bandwidth efficiencies 72% 81%, respectively. The experiment results show our algorithms could fully utilize external bandwidth. also implemented commonly used mixed precision algorithm 1.60× 1.45×

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On improving the performance of sparse matrix-vector multiplication

We analyze single-node performance of sparse matrix-vector multiplication by investigating issues of data locality and ne-grained parallelism. We examine the data-locality characteristics of the compressed-sparse-row representation and consider improvements in locality through matrix permutation. Motivated by potential improvements in ne-grained parallelism, we evaluate modiied sparse-matrix re...

متن کامل

Reconfigurable Sparse Matrix-Vector Multiplication on FPGAs

executing memory-intensive simulations, such as those required for sparse matrix-vector multiplication. This effect is due to the memory bottleneck that is encountered with large arrays that must be stored in dynamic RAM. An FPGA core designed for a target performance that does not unnecessarily exceed the memory imposed bottleneck can be distributed, along with multiple memory interfaces, into...

متن کامل

Optimizing Sparse Matrix Vector Multiplication on SMPs

We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two diierent graph algorithms. We present a performance study of this algorithmic kernel, showing ho...

متن کامل

Sparse Matrix-Vector Multiplication on FPGAs

Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientic and engineering applications. The poor data locality of sparse matrices signicantly reduces the performance of SpMXV on general-purpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13158952